dropout network
Neuron-Specific Dropout: A Deterministic Regularization Technique to Prevent Neural Networks from Overfitting & Reduce Dependence on Large Training Samples
In order to develop complex relationships between their inputs and outputs, deep neural networks train and adjust large number of parameters. To make these networks work at high accuracy, vast amounts of data are needed. Sometimes, however, the quantity of data needed is not present or obtainable for training. Neuron-specific dropout (NSDropout) is a tool to address this problem. NSDropout looks at both the training pass, and validation pass, of a layer in a model. By comparing the average values produced by each neuron for each class in a data set, the network is able to drop targeted units. The layer is able to predict what features, or noise, the model is looking at during testing that isn't present when looking at samples from validation. Unlike dropout, the "thinned" networks cannot be "unthinned" for testing. Neuron-specific dropout has proved to achieve similar, if not better, testing accuracy with far less data than traditional methods including dropout and other regularization methods. Experimentation has shown that neuron-specific dropout reduces the chance of a network overfitting and reduces the need for large training samples on supervised learning tasks in image recognition, all while producing best-in-class results.
Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
Shevchenko, Alexander, Mondelli, Marco
The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.
Mean field theory for deep dropout networks: digging up gradient backpropagation deeply
Huang, Wei, Da Xu, Richard Yi, Du, Weitao, Zeng, Yutian, Zhao, Yunce
In recent years, the mean field theory has been applied to the study of neural networks and has achieved a great deal of success. The theory has been applied to various neural network structures, including CNNs, RNNs, Residual networks, and Batch normalization. Inevitably, recent work has also covered the use of dropout. The mean field theory shows that the existence of depth scales that limit the maximum depth of signal propagation and gradient backpropagation. However, the gradient backpropagation is derived under the gradient independence assumption that weights used during feed forward are drawn independently from the ones used in backpropagation. This is not how neural networks are trained in a real setting. Instead, the same weights used in a feed-forward step needs to be carried over to its corresponding backpropagation. Using this realistic condition, we perform theoretical computation on linear dropout networks and a series of experiments on dropout networks. Our empirical results show an interesting phenomenon that the length gradients can backpropagate for a single input and a pair of inputs are governed by the same depth scale. Besides, we study the relationship between variance and mean of statistical metrics of the gradient and shown an emergence of universality. Finally, we investigate the maximum trainable length for deep dropout networks through a series of experiments using MNIST and CIFAR10 and provide a more precise empirical formula that describes the trainable length than original work.
Global SNR Estimation of Speech Signals using Entropy and Uncertainty Estimates from Dropout Networks
Aralikatti, Rohith, Margam, Dilip, Sharma, Tanay, Abhinav, Thanda, Venkatesan, Shankar M
This paper demonstrates two novel methods to estimate the global SNR of speech signals. In both methods, Deep Neural Network-Hidden Markov Model (DNN-HMM) acoustic model used in speech recognition systems is leveraged for the additional task of SNR estimation. In the first method, the entropy of the DNN-HMM output is computed. Recent work on bayesian deep learning has shown that a DNN-HMM trained with dropout can be used to estimate model uncertainty by approximating it as a deep Gaussian process. In the second method, this approximation is used to obtain model uncertainty estimates. Noise specific regressors are used to predict the SNR from the entropy and model uncertainty. The DNN-HMM is trained on GRID corpus and tested on different noise profiles from the DEMAND noise database at SNR levels ranging from -10 dB to 30 dB.
What my deep model doesn't know... Yarin Gal - Blog Cambridge Machine Learning Group
I come from the Cambridge machine learning group. More than once I heard people referring to us as "the most Bayesian machine learning group in the world". I mean, we do work with probabilistic models and uncertainty on a daily basis. Maybe that's why it felt so weird playing with those deep learning models (I know, joining the party very late). You see, I spent the last several years working mostly with Gaussian processes, modelling probability distributions over functions. I'm used to uncertainty bounds for decision making, in a similar way many biologists rely on model uncertainty to analyse their data. Working with point estimates alone felt weird to me. I couldn't tell whether the new model I was playing with was making sensible predictions or just guessing at random. I'm certain you've come across this problem yourself, either analysing data or solving some tasks, where you wished you could tell whether your model is certain about its output, asking yourself "maybe I need to use more diverse data? or perhaps change the model?". Most deep learning tools operate in a very different setting to the probabilistic models which possess this invaluable uncertainty information, as one would believe. I recently spent some time trying to understand why these deep learning models work so well – trying to relate them to new research from the last couple of years. I was quite surprised to see how close these were to my beloved Gaussian processes. I was even more surprised to see that we can get uncertainty information from these deep learning models for free – without changing a thing. Update (29/09/2015): I spotted a typo in the calculation of \tau; this has been fixed below.